surge analysis on IOC tide gauge¶

case study: cyclone FUNG WONG

Install¶

Libraries needed for this notebook:

pip install searvey hvplot utide ipykernel geoviews pyarrow

In [1]:
import searvey
import utide
import pandas as pd
import hvplot.pandas

from shapely.geometry import box
/home/tomsail/work/gist/ioc_surge_analysis/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
In [2]:
ioc_stations = searvey.get_ioc_stations()
In [3]:
bbox = box(100.0, -5.0, 130.0, 25.0)  # lon_min, lat_min, lon_max, lat_max
In [4]:
selected_stations = ioc_stations[ioc_stations.within(bbox)]
selected_stations
Out[4]:
ioc_code gloss_id country location connection added_to_system observations_arrived_per_week observations_expected_per_week observations_ratio_per_week observations_arrived_per_month ... transmit_interval lat lon contacts dcp_id last_observation_level last_observation_time delay interval geometry
45 ambon 68 Indonesia Ambon SZXX01 2008-11-20 12:43:00 9180 10080.0 91 36510 ... 5' -3.683 128.183 Geospacial Agency of Indonesia ( Indonesia ) +... 301434060409970 5.03 13:00 9' 5' POINT (128.183 -3.683)
47 AMTSI <NA> Indonesia Port of Ampana, Central Sulawesi web 2025-08-15 14:11:20 9690 10080.0 96 37857 ... 5' -0.847 121.601 Kepala Badan Meteorologi, Klimatologi, dan Geo... NaN 0.01 13:00 9' 5' POINT (121.601 -0.847)
132 BETSI <NA> Indonesia Port of Beo, North Sulawesi web 2025-08-15 14:26:13 9835 10080.0 98 43470 ... 5' 4.230 126.788 Kepala Badan Meteorologi, Klimatologi, dan Geo... NaN 0.01 13:05 4' 5' POINT (126.788 4.23)
136 bitu 69 Indonesia Bitung SZXX01 2008-11-20 13:48:00 9125 10080.0 91 36675 ... 5' 1.439 125.190 Geospacial Agency of Indonesia ( Indonesia ) +... 300434067058990 8.48 13:00 9' 5' POINT (125.19 1.439)
183 bupo <NA> Indonesia Bungus Port bgan 2020-07-07 09:10:02 -down- 10080.0 0 -down- ... 1' -1.030 100.396 Joint Research Centre ( Europe ) +Ministry of ... BUPO 1.45 -down- 1177d 1' POINT (100.396 -1.03)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1621 txil <NA> Taiwan Xiao Liuqiu web 2024-01-02 15:35:20 168 168.0 100 743 ... 1h 22.353 120.383 Central Weather Administration ( Taiwan ) NaN 0.41 12:00 1h 1h POINT (120.383 22.353)
1622 tyon <NA> Taiwan Yongan web 2024-01-02 15:35:20 168 168.0 100 743 ... 1h 22.819 120.198 Central Weather Administration ( Taiwan ) NaN -999.00 12:00 1h 1h POINT (120.198 22.819)
1672 vung <NA> Viet Nam Vung Tau SZXX01 2007-10-27 11:39:00 8895 10080.0 88 35835 ... 6' 10.340 107.071 Hydro-meteorological and Environmental Station... 300434067059990 3.81 13:00 9' 6' POINT (107.071 10.34)
1706 xish <NA> China Xisha SZCI01 2014-10-02 13:42:13 -down- 10080.0 0 -down- ... 5' 16.830 112.330 China Meteorological Administration ( China ) 11745 1.30 -down- 3134d 5' POINT (112.33 16.83)
1717 zhap 78 China Zhapo SZCI01 2014-10-02 14:12:15 4500 10080.0 45 18491 ... 1' 21.580 111.820 China Meteorological Administration ( China ) 09731 2.59 12:59 10' 1' POINT (111.82 21.58)

70 rows × 25 columns

In [5]:
selected_stations.hvplot.points(
    geo=True,
    tiles=True,
    hover_cols=["ioc_code"],
    title = "stations in the selected region"
)
Out[5]:

detide the stations¶

In [6]:
def surge(ts: pd.Series, lat: float, rsmp: int = None):
    ts0 = ts.copy()
    OPTS = {
        "constit": "auto",
        "method": "ols", # ols is faster and good for missing data (Ponchaut et al., 2001)
        "order_constit": "frequency",
        "Rayleigh_min": 0.97,
        "lat": lat,
        "verbose": True,
    }
    if rsmp is not None:
        ts = ts.resample(f"{rsmp}min").mean()
        ts = ts.shift(freq=f"{rsmp / 2}min")
    coef = utide.solve(ts.index, ts, **OPTS)
    tidal = utide.reconstruct(ts0.index, coef, verbose = OPTS['verbose'])
    return pd.Series(data=ts0.values - tidal.h, index=ts0.index)

download the data¶

create a data folder with raw and surge subfolders

In [7]:
! mkdir -p data/{raw,surge}

let's first download all the station in the bounding box

In [13]:
drop_columns = ["bat"]

def serialize(d): # to export metadata
    out = {}
    for k, v in d.items():
        if v is not pd.NA and not pd.isna(v):
            out[k] = str(v)
    return out
In [19]:
for irow, row in selected_stations.iterrows():
    lat = row.lat
    ioc_code = row.ioc_code
    df_raw = searvey.fetch_ioc_station(
        ioc_code,
        pd.Timestamp.now()-pd.Timedelta(days=90), # we need at least 90 days (in theory we'd need more..) to remove properly tidal constituents
        pd.Timestamp.now()
    )
    df_raw.attrs = {**serialize(dict(row)), **{"signal_type": "raw"}}
    df_raw.to_parquet(f"data/raw/{ioc_code}.parquet")
IOC-bupo: No data. Creating a dummy dataframe
IOC-ms002: No data. Creating a dummy dataframe
IOC-ms004: No data. Creating a dummy dataframe
IOC-ms005: No data. Creating a dummy dataframe
IOC-ms006: No data. Creating a dummy dataframe
IOC-nans: No data. Creating a dummy dataframe
IOC-padn: No data. Creating a dummy dataframe
IOC-sebla: No data. Creating a dummy dataframe
IOC-tanjo: No data. Creating a dummy dataframe
IOC-tdaw: No data. Creating a dummy dataframe
IOC-tfan: No data. Creating a dummy dataframe
IOC-thep: No data. Creating a dummy dataframe
IOC-tluk: No data. Creating a dummy dataframe
IOC-tnan: No data. Creating a dummy dataframe
IOC-xish: No data. Creating a dummy dataframe
In [20]:
for irow, row in selected_stations.iterrows():
    df_raw = pd.read_parquet(f"data/raw/{row.ioc_code}.parquet")
    df_raw.loc["2025-11-01":].dropna().hvplot(
        title = f"detided signal for {row.ioc_code}, {row.location}, lat:{row.lat}, lon:{row.lon}"
    )
Out[20]:
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
Out[20]:
bokeh backend could not plot any Elements in the Overlay.
Out[20]:
:NdOverlay   [Variable]
Out[20]:

detide the station¶

Let's look at the stations of interest

In [21]:
detide_selection = {
    "lega" :"rad",
    "luba" :"prs",
    "quin" :"ras",
    "qing" :"rad",
    "quar" :"flt",
    "mani" :"prs",
    "subi" :"rad",
    "thsi" :"rad",
    "txil" :"rad",}
In [22]:
for irow, row in selected_stations.iterrows():
    if row.ioc_code in list(detide_selection.keys()):
        df_raw = pd.read_parquet(f"data/raw/{row.ioc_code}.parquet")
        df_surge = surge(df_raw[detide_selection[row.ioc_code]].dropna(), lat=row.lat, rsmp=2)
        df_surge.loc["2025-11-01":].hvplot(
            title = f"detided signal for {row.ioc_code}, {row.location}, lat:{row.lat}, lon:{row.lon}"
        )
        df_surge.attrs = dict(row)
        df_surge.attrs = {**serialize(dict(row)), **{"signal_type": "detide"}}
        df_surge.to_frame().to_parquet(f"data/raw/{row.ioc_code}.parquet")
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
solve: matrix prep ... solution ... done.
prep/calcs ... done.
Out[22]:
In [23]:
ioc_stations[ioc_stations.ioc_code.isin(list(detide_selection.keys()))].hvplot.points(
    geo=True,
    tiles=True,
    c = 'r',
    s=100,
    hover_cols=["ioc_code"],
    title = "stations that recorded a surge"
)
Out[23]:

look at a specific station¶

In [24]:
station = "lega"
for irow, row in selected_stations.iterrows():
    if row.ioc_code == station:
        df_raw = pd.read_parquet(f"data/raw/{row.ioc_code}.parquet")
        df_raw
        station
        df_raw.loc["2025--01":].dropna().hvplot(
            title = f"signal for {row.ioc_code}, {row.location}, lat:{row.lat}, lon:{row.lon}"
        )
Out[24]:
0
time
2025-08-13 14:16:00 0.172636
2025-08-13 14:17:00 0.172863
2025-08-13 14:18:00 0.174116
2025-08-13 14:19:00 0.173393
2025-08-13 14:20:00 0.174694
... ...
2025-11-11 13:01:00 -0.149741
2025-11-11 13:02:00 -0.149008
2025-11-11 13:03:00 -0.148250
2025-11-11 13:04:00 -0.148467
2025-11-11 13:05:00 -0.147660

106149 rows × 1 columns

Out[24]:
'lega'
Out[24]: